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CONVERGENCE RATE AND AVERAGING OF 
NONLINEAR TWO-TIME-SCALE STOCHASTIC 
APPROXIMATION ALGORITHMS 

By Abdelkader Mokkadem and Mariane Pelletier 

University of Versailles-Saint- Quentin 

The first aim of this paper is to establish the weak convergence 
rate of nonlinear two-time-scale stochastic approximation algorithms. 
Its second aim is to introduce the averaging principle in the context of 
two-time-scale stochastic approximation algorithms. We first define 
the notion of asymptotic efficiency in this framework, then introduce 
the averaged two-time-scale stochastic approximation algorithm, and 
finally establish its weak convergence rate. We show, in particular, 
that both components of the averaged two-time-scale stochastic ap- 
proximation algorithm simultaneously converge at the optimal rate 
\fn. 



1. Introduction. Let 



x R d ' —* R d , f R d x R d ' -> R d ' 



f: \(6,ri^f(e, f i) and 9: {(8,v)^g(9,v) 

be two unknown functions, and let (9*, fx*) be the unique solution to the 
equations 

/(M) = and g(G,n)=0. 

Assume that error-contaminated observations of f(9, fi) and g(9, fj) are avail- 
able at any level (9,/i). The two-time-scale stochastic approximation algo- 
rithm, which allows the recursive approximation of (6*,/j,*), is defined as 

(1) 9 n+ \ = 9 n + f3 n X n+ i, 

(2) jUn+l = fJ<n + ln.Yn+1, 
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where X n+ \ and Y n+ \ are error-contaminated observations of f(9 n ,fi n ) and 
g{6n-,Hn)i respectively, and where the step sizes {(3 n ) and (7™) are two posi- 
tive nonrandom sequences converging to zero with different rates. 

Over the past few years, several such algorithms have been proposed for 
various applications (see [1, 3, 4, 12, 13]), and criteria ensuring the almost 
sure convergence of (9 n ,/j, n ) to (#*,//*) have been established by Borkar [5], 
Konda and Borkar [12] and Konda and Tsitsiklis [13]. To our knowledge, the 
only existing result on the convergence rate of the two-time-scale stochastic 
approximation algorithm (l)-(2) is the one of Konda and Tsitsiklis [14]. In 
the case when the functions / and g are linear and when lin^—Hx, (3 n /^/ n = 0, 
Konda and Tsitsiklis [14] establish that the fastest component 9 n satisfies 
the following central limit theorem (CLT): 

(3) ^(9 n -e*)ZM(0,E e ), 

where — ► denotes the convergence in distribution, N the Gaussian-distribution, 
and where the asymptotic covariance matrix is defined in (8) below. 
Moreover, it can be conjectured from their analysis that the slowest compo- 
nent [i n fulfills the CLT: 

(4) V7?(^-^)^-^(°' S m) 

[where the asymptotic covariance matrix is defined in (9) below]. The 
result (3) of [14] is thus very surprising. As a matter of fact, it shows that 
the slowest component \x n [which, through X n+ i, is present in the recursive 
definition (1) of 9 n ] has no effect on the convergence rate of the fastest 
component 9 n , except in the expression of the asymptotic covariance matrix 
Eg. It is then natural to wonder whether this phenomenon is specific to the 
case of the functions / and g being linear or not. 

Our first aim in this paper is to study the weak joint convergence rate of 
9 n and fi n in the case where the functions / and g are nonlinear. We still 
consider the case linin^oo (3 n /^ n = 0, and prove that 

m (f^SH^ d). 

The CLT (5) extends, in particular, the result (3) of [14] to the case where the 
functions / and g are nonlinear. Let us underline that, as explained in [14], 
in the case (f3 n ) = (7 n ), the algorithm defined by (l)-(2) reduces to a single- 
time-scale stochastic approximation algorithm used for the search of the 
zero of the function h : R d+d ' -»• M d+d ' defined by h(6,(i) = (f(9,fj,),g(9,fj,)). 
The convergence rate of such single-time-scale stochastic approximation al- 
gorithms has been widely studied (see, among many others, Nevels'on and 
Has'minskii [21], Kushner and Clark [15], Benveniste, Metivier and Priouret 
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[2], Ljung, Pflug and Walk [10] and Duflo [9]), but the existing techniques do 
not apply when the step sizes (/3 n ) and (j n ) have two different convergence 
rates, that is, in the context of two-time-scale stochastic approximation al- 
gorithms. Let us also point out that the two-time-scale iterations considered 
by Konda and Tsitsiklis [14] and in the present paper are totally different 
from those that arise in the study of the tracking ability of adaptative al- 
gorithms (see [2]) or in the joint approximation of the location and size of 
the maximum of a regression function (see [20]); the specific difficulty in the 
present context relies on the double dependency between both components 
9 n and \i n [9 n defined by (1) depends, through X n+ i, on \i n defined by (2), 
whereas fj, n defined by (2) depends, through Y n+ i, on 9 n defined by (1)]. 
Let us finally underline that the techniques we use to prove (5) (introduc- 
tion of exponential martingales and recourse to successive almost sure upper 
bounds) radically differ from those employed by Konda and Tsitsiklis [14] 
to establish (3); let us also mention that the additional difficulty induced by 
the nonlinearity of the functions / and g will be enlightened in our proof of 



Now, let us note that, in view of (3) and (5), the recommended choice 
of the fastest step size (/3 n ) is ((3 n ) = (Pon -1 ), since it is the choice which 
ensures that the fastest component 9 n converges with the optimal rate \pn. 
However, this optimal choice induces conditions on the parameter /?o, which 
are difficult to handle because of depending on an unknown parameter. The 
problem due to the choice of the optimal step size {(3 n ) = (/^n" 1 ) is now 
well known in the context of single-time-scale stochastic approximation al- 
gorithms, and the method widely employed in this framework to circumvent 
this problem is the use of the averaging principle independently introduced 
by Ruppert [26] and Polyak [24], and then widely discussed and extended 
(see, among many others, Yin [27], Delyon and Juditsky [6], Polyak and 
Juditsky [25], Kushner and Yang [16], Dippon and Renz [7, 8], Duflo [9], 
Kushner and Yin [17] and Pelletier [23]). 

Our second aim in this paper is to introduce the averaging principle in 
the context of two-time-scale stochastic approximation algorithms. We first 
define the notion of asymptotic efficiency in this framework, then introduce 
the averaged two-time-scale stochastic approximation algorithm, and finally 
establish its weak convergence rate. We prove, in particular, that, by choos- 
ing the step sizes ((3 n ) and (^ n ) equal to (/3 n ) = (f3on~ b ) and (j n ) = (7o n ~ a ) 
with 1/2 < a <b < 1, and by defining the averaged two-time-scale algorithm 
by setting 



(5). 



1 



n 



1 



n 



9 



n — 



n 



9k and fi, 
fc=i 



•n 



n 



k=i 
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where Ok and (Mk are denned in (l)-(2), we obtain an asymptotically efficient 
two-time-scale algorithm, which satisfies the CLT 



(y/n(O n 
\Vn(~Pn 



V 



AT(0,C), 



where the asymptotic covariance matrix C is precisely defined (see Theo- 
rem 2). The striking aspect of this result is that averaging leads to a two- 
time-scale algorithm whose components n and ~p n simultaneously converge 
with the optimal rate y/n. 

Our paper is now organized as follows. Section 2 is devoted to the study of 
the convergence rate of nonlinear two-time-scale stochastic approximation 
algorithms. We first precisely state our assumptions and main results; then, 
we give the outlines of the proof of our main results, postponing the technical 
parts until the Appendix. Section 3 is reserved for averaging. The notion of 
asymptotic efficiency of two-time-scale stochastic approximation algorithms 
is introduced in Section 3.1; the weak convergence rate of the averaged 
two-time-scale algorithm is stated and then proved in Sections 3.2 and 3.3, 
respectively. 

2. Convergence rate of nonlinear two-time-scale stochastic approximation 
algorithms. 

2.1. Assumptions and notation. For any square matrix A, we set 

A( A ) = - max{7£e(A), A £ Sp(A)}, 

where Sp(^4) denotes the spectrum of A. Moreover, || • || denotes the Eu- 
clidean vector norm in M. d , M. d and M. d+d without distinction, and ||| • ||| the 
matrix norm induced by the Euclidean vector norm. 
The assumptions we require are the following: 

(Al) lim^oo n = 0* a.s. and lim^^ fx n = fi* a.s. 



(A2) 



(i) There exists a neighborhood IA of (0*,fi*) such that, for all 



Qn 
Q21 



Qn 
Q22 



0* 

/' 



+ 



n-n 



(ii) Set 



(6) 



H — Qn — Ql2<222 1( 321- 



We have > and A^ 22 ) > 0. 

(A3) (i) (/?„) = ((3 n~ b ) and ( 7n ) = (70^°) with O > 0, 7o > and 
\ <a<b<l. 

(ii) If 6=1, then (3 > 1/[2A^]. 
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(A4) The error-contaminated observations can be written as 

X n+1 = f(0 n ,Hn) + 4n ) +V n +l, 

Y n+1 = g(e n ,fi n )+^+W n+1 , 

and denoting by T n the <7-field spanned by {Vi,Wj,9k, fJ-hipffl iWi' i < 
k, I, k', I' < n}, we have the following: 

(i) E(V n+1 \F n ) = and E(W„+i|.F„) = a.s. 

(ii) There exists a positive matrix T such that 



lim E 

n— »oo 



Vn+1 
W n+l 



(K+i w£ +1 ] 



rn Ti2 
r 2 2 



a.s. 



(hi) There exists m > 2/a such that supyjEdiyn+iH" 1 !^) < oo and 
sup n E(||W n+ i|| m |^ n )<oo a.s. 
(iv) 

^(j>) = r (0) + o(\\e n -e*\\ 2 + \\v n - 
^)= r M + o(\\9 n -e*f + \\(, n -ff), 

with \\r^\\ + ||rvr|| = o( v / /^~) a.s. 

Let us specify that the matrices Qn and Tn (resp. Q22 arid T22) in (A2)(i) 
and (A4)(ii) are d x d (resp. d' x d') matrices; the matrices Qi2> Q2I1 Ti2 
and T2i are of appropriate dimension. Set 

T e = lim EQK+l - Q^Q^Wn+llfK+l - Q^Q^Wn+lfl^n) 

( 7 ) n ~"°° _ a -ITT -ITT -1 

= Til + Q12Q22 r 22[<522 ] Ql2 ~~ ri2[<3 2 2 ] Ql2 ~ Q12Q22 r 21- 

We can now give the explicit definition of the asymptotic covariance matrices 
Eg and which stand in (3), (4) and (5): 



(8) 



(9) 



E 6 



exp 



2/3 



exp 



2/3 



(it, 



exp[Q22*]r22 exp[Q22*] di. 



Let us mention that the matrices and £ M are the solutions of the Lya- 
pounov equations 



2/5 . 



and 



Q22^n + £<^Q22 = 

respectively (see Lemma 3.1.3 in (year?)). 



T22, 



-r fl 



G 
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Comments on the assumptions. 

1. We refer to [5, 12, 13] for quite general conditions that ensure the 
consistency assumption (Al). Let us underline that, in the case where / 

and g are linear, ipffl = and ipn = 0, assumption (Al) is useless; as a 
matter of fact, as noted by Konda and Tsitsiklis [14], assumptions (A2)- 
(A4) imply (Al) in this particular case. Let us also mention that a particular 
example of two-time-scale stochastic approximation algorithm is the well 
known Polyak-Ruppert averaging; in this framework, (l)-(2) reduces to 

On+l = n H ([J-n ~ #n), 

n 

fJ-n+1 = + 7n^n+l) 

where Y n+ i is an error-contaminated observation at \i n of an unknown func- 
tion h, and lim n ^oon7 n = oo; (Al) then comes down to the assumption 
lim^^oo/Xn = fi* [where h(fx*) = 0], and conditions which ensure this lattest 
assumption can be found, among many others, in [9, 15, 18]. 

2. Assumptions (A2)(ii) and (A3)(ii) ensure that the matrices Tg and 
T^ are well defined. As a matter of fact, the conditions in (A2)(ii) mean 
that the matrices H and Q22 are attractive (or Hurwitz) and, in the case 
b= 1, it follows from the condition in (A3)(ii) that the matrix [H + 2foL] is 
attractive. 

3. To establish the convergence rate of the two-time-scale stochastic ap- 
proximation algorithm (l)-(2), Konda and Tsitsiklis [14] assume that the 
functions / and g are linear, that is, that 

(f(e,fi)\ (Q n Qi2\(e-e*\ 

Moreover, their framework corresponds to the case (A4) is fulfilled with 

ipn = 0, ipn^ = 0, and (V n , W n ) are independent random vectors with zero 
mean and common covariance T. On the other hand, their conditions on the 
step sizes {(3 n ) and (7 n ) are more general than ours. 

2.2. Main results. Our main result in this section is the following theo- 
rem. 

Theorem 1 [Joint weak convergence rate of (6 n ) and Let (0 n ,/j, n ) 

be defined by the recursive equations (l)-(2). Under assumptions (A1)-(A4), 
we have 

f^-i)Wo,ft »)). 

where Tig and T^ are defined in (8) and (9), respectively. 
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The following proposition, which is of independent interest, will be a key 
tool for the study of the weak convergence rate of the averaged two-time- 
scale stochastic approximation algorithm. 

Proposition 1 [Strong convergence rate of (9 n ) and (/%)]. Let (6 n ,p n ) 
be defined by the recursive equations (l)-(2). Under assumptions (A1)-(A4), 
we have 



and 



\Pn - V 



o 



o 





n 


/?nlog 






-k=l . 


7nlog 


n 


\ 


-fc=l - 



a.s. 



a.s. 



2.3. Proof of Theorem 1 and Proposition 1. Throughout the proof of 
Theorem 1 and Proposition 1, we assume, without loss of generality, that 
6* = and p* = 0. In view of assumptions (Al), (A2) and (A4), we can write 

(10) o n+1 = e n + p n (Q n e n + + P w + r^ + k+i), 

(11) p n+1 = p n + 7n(Q2lO n + Q22Vn + P^ + W + ^n+l), 

where 



(12) \\pg>\\=O{\\0 n 
Note that (11) gives 

= Q 2 2 1 7n Vn+1 " Mn] " Q^iQ^n + P^ + ^ + W re+ l), 

and thus, in view of (10), it follows that 

Gn+l =6n + PniQll^n + Q12Q22 7n 1 [^n+l — Mn] 

" QliQ£(Q2l0n + + 4 m) + WWi) 
+ pf +rf + K+i) 

+ /3 n (T4+i - Ql 2 022 1W n+l) 

+ M/^ + r®] - Q^ 1 [pM + rW]), 
where -ff is defined in (6). Now, set 

n 
fc=l 



and ||/3, 



Ml 



0(||^|| 2 +||Mn|| 2 ). 



(13) 
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n 

(14) 4+1 = e UnH E e- UkH Pk(V k+1 - QuQ^Wk+i), 

fc=i 

n 

(15) R% = e u " H E e-^&Q^Sfc Vfc+i - Mfc], 

fc=i 

(16) = 6» n+ i - 4+i ~ ^i+i 
and 

n 

s n = ^ ] 7fc ) 
fc=l 

(17) = e s " Q ™ e~ s ^ 7k W k+1 , 

k=l 
n 

(18) R% = e s ^ ^ SkQ22 lkQ2i[LT + R { k °\ 

k=l 

( 19 ) A i+1 = Mn+1 - 4+1 ~ ^i+1 • 

The main idea to establish Theorem 1 and Proposition 1 is to prove that the 
sequences (R n ) and (A„ ) are negligible in front of (L n ) on the one hand, 
and that the sequences ) and (A„ ) are negligible in front of (4 M ^) 
on the other hand; the convergence rates of (9 n ) and (fj, n ) are then given 

by the ones of (4^) and (4 M ^), respectively. Let us note that, even though 
the sequence (/u, n ) goes to zero a.s. slower than the sequence (6 n ) does, we 

(6) 

shall prove that the term (R n ) goes to zero a.s. faster than the sequence 
(6 n ) does. This is due to an averaging effect, the sequence (Rn, ) bringing 
in a weighted sum of the differences f^k+i — ^k- I n the sequel we shall come 
back on this effect several times. 

Applying Lyapounov's theorem, we obtain the following lemma (see Sec- 
tion A. 2 for the technical details). 



Lemma 1 [Joint weak convergence rate of (4^) and (4i )]• We have 



Jn J^n 



Moreover, the following lemma is proved in [22]. 

Lemma 2 [Strong convergence rate of (4°) and (4 m) )]- We have 
\\L^ || =0(V/?n log u n ) a.s. 
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and 

\\L^\\ =OW ln \ogs n ) 

Theorem 1 (resp. Proposition 1) thus follows from the combination of 
Lemma 1 (resp. of Lemma 2), and of the following two lemmas (which im- 
ply, in particular, that the sequences (/3„- 1/2 rf + Af])and (> 1/2 [^ + 
A^]) go to zero a.s.): 

Lemma 3 [Strong convergence rate of (R { n ] ) and (R^)}. 

1. There exists s > b/2 such that \\Rn || =0(n~ s ) a.s. 

2. II =0(VA» log u n ) a.s. 

Lemma 4 [Strong convergence rate of (A ( n ] ) and (A^)]. We have 
|| AC?) ||= oh/AO a.s., 
l|A^|| = (v^) a.s. 

The key point in the proof of Theorem 1 and Proposition 1 is thus the 
proof of Lemmas 3 and 4. The rest of Section 2 is devoted to this proof (we 
shall refer to the Appendix for the technical details). Let us first give the 
strategy to prove these lemmas. 

We note that, to obtain an upper bound of (R^), we need to have an 
upper bound of (Rn ), which requires to have an upper bound of (/x n ). 
The main idea to prove Lemma 3 is thus to proceed by successive upper 
bounds. In a first step, we shall start with the only upper bound of (fj, n ) 
available to us, that is, in view of assumption (Al), with \\fJ. n \\ — °(1)- This 
will enable us to establish a first upper bound of (R^) and then of (R^). 
With these preliminary upper bounds, we shall be able to prove preliminary 
upper bounds for (A^) and (A^'). Using (19) and applying Lemma 2, we 
shall then slightly improve the first upper bound of (// n ); starting with this 
second upper bound of (n n ), we shall then repeat the procedure previously 
described to find a third upper bound of (// n )> which slightly improves the 
second one, and we shall carry on these successive upper bounds until we 
obtain the adequate upper bounds of (/x n ), (Ri 6) ), (R^), (A<?>) and (A^). 

Let us mention that the step, which consists in deducing upper bounds 
of (Af ) and (A^) from upper bounds of (/%), (L^), (L^), (i?f ) and 
(i?i^), is quite straightforward in the case when the functions / and g 
are linear, tpn^ = and ipn = (see Remark 4 below); however, in the 
case where the functions / and g are nonlinear, this step too requires to 
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compute successive upper bounds [we shall first show that ||A^|| = o(l), 
and then shall recursively improve the upper bound of (A^ } ) until we find 

the adequate upper bound of (A^)]. 

Our proof of Lemmas 3 and 4 is now organized as follows. We first define 
Conditions (C) and (C) [that are expressed with respect to the step sizes 
(f3 n ) and (7 n ) resp.] for a nonrandom sequence, conditions which will be used 
throughout the proof. Then, in Section 2.3.1, we show how the knowledge 
of an upper bound of (/i n ) and of (An ) enables to establish upper bounds 
of (R^), (R^ ] ), (A&^and to improve the upper bound of (A[^). Section 
2.3.2 is devoted to the body of the proof of Lemmas 3 and 4. 

Definition 1 [Condition (C)]. Let (w n ) be a sequence of real numbers. 
We say that (w n ) satisfies Condition (C) if (w n ) is positive and bounded 
and if: 

• in the case b = 1, there exist u > and a nondecreasing slowly varying 
function C such that w n = n _w £(n); 

• in the case b < 1 , 

-^=- = i + o(A.). 

W n+ 1 

Definition 2 [Condition (C)]. Let (w n ) be a sequence of real numbers. 
We say that (w n ) satisfies Condition (C) if (w n ) is positive and bounded 
and if 

= l + o(7„). 

W n +l 

Remark 1. If b = 1 and if (w n ) satisfies Condition (C) with u = 0, then 
the function C is necessary bounded. 

Remark 2. In the case b < 1, if (w n ) satisfies Condition (C), then (w n ) 
satisfies Condition (C). 

2.3.1. Intermediate upper bounds. We can now state the following lemma, 
which gives an upper bound of (Rn^) and (Rn ) under the assumption 
1 1 /i n 1 1 = 0(w n ), where (w n ) is a nonrandom sequence satisfying Conditions 
(C) and (C). The proof of this lemma only requires classical computations, 
and is thus postponed until the Appendix (see Section A. 3). 

Lemma 5 [Intermediate upper bound of (RHP ) and (R^ )]■ A ssume that 
there exists a nonrandom sequence (w n ) satisfying Conditions (C) and (C), 
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\\RM || = 0(/? n7 - V + y/P n k)gU n ) U.S. 



Remark 3 . The term R)?' can be seen as a (matricial) weighted average 

(0) (6) 

of the terms L k + R k ; the second upper bound in Lemma 5 is established 
by proving that the same upper bound holds for the sequence {L n + R n ) 
and for its average (R^), which seems quite natural. On the other hand, 

(0) 

the term Rn can be seen as a (matricial) weighted average of the terms 
7 A T 1 [^fc + i — fik]', the striking aspect of the first upper bound in Lemma 5 is 

that, although \i n is bounded by w n , although 7" 1 — > oo, the average Rn 
can be bounded by fl n ln l Wn (which is smaller than w n since (3 n ln 1 — * 0). 
This averaging effect is similar to the one which appears in the study of the 
averaged single-time-scale stochastic approximation algorithm introduced 
by Ruppert [26] and Polyak [24]. 

We now state a lemma, which gives an upper bound of (A ( n ] ) and (Ai M) ) 
under the assumption \\/J, n \\ =0(w n ) and ||An^|| = 0(8n ), where (w n ) and 
are two nonrandom sequences satisfying Conditions (C) and (C). 

Lemma 6 [Intermediate upper bound of (An ) an d (A^)]. Assume that 

there exist two nonrandom sequences (w n ) and (5 n ^) satisfying Conditions 

(C) and (C), and such that \\fJ- n \\ = 0(w n ) a.s. and ||A^|| =0(5 n ^) a.s. 
We have 



We now give the outlines of the proof of Lemma 6, and refer to the 
Appendix for the technical computations. 

Outlines of the proof of Lemma 6. We first note that An' and A^ 
satisfy the following recursive expressions (see Section A. 4.1 for the algebra 
leading to these equations): 



II Ai e) || = 0{(i 2 nl - 2 w 2 n + to-HM) + o (VA0 

II I) = 0{(i 2 nl - 2 W 2 n + Mn 1 ^) + o( ^fWn) 



a.s. 



a.s. 



(20) 




(I + f3 n H)A^ + 0((3 2 n )[L^+R^] 

+ Mpf + r n 9) ] - Q12Q22 [p£° + rM]) 

{I + lnQ22)^ ) +0{ 1 2 n )[L^+R^] 

+ 7n[^ ) + r^ + Q 21 A(f)]. 



(21) 
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Now, set T and M such that < T < and < M < A( Q22 ) respec- 
tively. In view of Proposition 3.1.2 in [9], there exist two matrix norms ||| • ||| T 
and I • ||| M , and there exists a E ]0, inf{l/T, 1/M}[ such that, for all 7 < a, 
|||/ + 7-H| T < I-7T and |||I + 7Q22IIIM <l~lM. For x in R d (resp. in R d '), 
define M d (x) = [xx ■ ■ ■ x] (resp. M d (x) = [xx ■ ■ ■ x]) the d x d (resp. d' x d') 
matrix all of whose columns are x. The function || • \\ T (resp. || • || M ) defined 
on R d (resp. on R d ') by ||x|| T = |||M d (x)||| T (resp. by ||x|| M = |||M d ' (x)||| M ) 
is then a vector norm compatible with the matrix norm ||| • \\\ T (resp. with 
III ' IIIm) ( see [H]) P a § e 297). For n large enough, we thus have 

||AWj T <(l-/yT)||Af \\ T + p n [0(p n \\L^\\ T + (3 n \\R^\\ T )] 

(22) + Pn[0(\\p^\\ T + \\r^\\ T + \\Q 12 Q^p^\\ T 

+ ||Qi 2 Q 2 - 2 1 r^|| T )] 



and 
(23) 



l^n+lIlM <(l-7nM)||AM|| M + 7n[0(7n||4 /i) |lM+7n||^ ) || M )] 

+ ln[0(\\p^\\ M + \\r^\\ M + \\Q 21 A^ \\ M )}. 



Remark 4. In the case where the functions / and g are linear and 
when ijiffl = and ifrrfi = 0, the terms pffl and pifi equal zero; replacing 
in (22) \\Ln ||t and \\Rn \\t by their upper bounds given in Lemmas 2 and 
5 enables to get an upper bound 5n of ||A n ||r- Then, replacing in (23) 
\\L^ \\m and H-R^Hm by their upper bounds given in Lemmas 2 and 5, 

(0) (6) 

and ||Q2iAn \\m by its upper bound <% , enables to obtain an upper bound 

of IIA^Hm- Thus, in this particular framework, the proof of intermediate 

upper bounds of (A^) and (A^) is quite straightforward. Moreover, the 

upper bounds of (A^) and (An ) obtained in this case are better than 
those stated in Lemma 6 [compare (22) with (28) below, and (23) with (27) 

below]; in particular, the knowledge of a preliminary upper bound 5^ of 

the sequence (A^) is not necessary. 

By using the equivalence property of the finite-dimensional vector norms, 
we note that, in view of (12), (16) and (19), we have 

WpWWt + WQuQmpWWt 

= 0(\\p^\\ + \\p^\\) 
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= O(||0„|| 2 + ||/i n ,|| 2 ) 

= OOlLff + WW + l|A^|| 2 + ||4 M) H 2 + ||^ || 2 + || A^) || 2 ) 

= 0{\\L^f + ||JSW|| 2 + ||A£»|| 2 + \\LM\\ 2 + \\R^\\ 2 + \\A^\\ 2 M ). 

It thus follows from (22) that there exists C\ > such that, for n large 
enough, 

||Ai e jj T <(l-/3 n T)||A(% 

+ /3n[O(/?n||4 0) H +/3n||4 e) H + Ikf || + ||#||)] 

+ /3^ 1 (||Lr|| 2 + ||^)|| 2 + ||AW|| 2 . 

+ ||L(f)|| 2 + || J R(r)|| 2 + ||AM|| 2 v/ ). 

Similarly, we can deduce from (23) the existence of C2 > such that, for n 
large enough, 

||A2 1 || M <(l- 7n M)||AM|| M 

+ 7n[0(7„.||L(f ) ll+7n||^ ) || + ||r^||)] 

(25) 

+ ^C 2 (||Z.Cf)|| 2 -h ||^>|| 2 -h ||^tf>||^ + ll^^ll 2 + ll^^ll 2 

+ ||AM|| 2 / + ||A(f)|| T ). 

Now, let us note that, in view of assumption (Al), we have Xvain^^Qn = 
and lim ra _ too fi n = a.s. Since lim n _ >00 /3 n 7~ 1 = 0, Lemma 5 [applied with the 

sequence (w n ) = 1] implies that linv^oo Rn = and lim n ^oo Rn = a.s. 
Noting that Lemma 2 ensures that linin^oo Liffl = and lim^^co itf = 
a.s., we deduce that lim n ^ oc = and lin^^oo A„ = a.s. Set T* and 
M* such that <T* <T and < M * < M, respectively; we can then 
deduce from (24) that, for n large enough, 

\\A^ +1 \\ T <(l-p n T*)\\A^\\ T 
(26) +p n O[(/3 n \\L^\\+p n \\R^\\ + ||rW|| + ||r<f>||)] 

+ /3 n d(||L(f) || 2 + || 2 + IlL^f + \\R^\\ 2 + \\A^f M ) 
and from (25), that there exists C' 2 > such that, for n large enough, 

||aH 1 || m <(1- 7 .M*)||AM|| m 

+ 7n[0 ( 7n || L M|| +^11^)11 + 11^)11)] 

(27) 

+ 7nM*^(||L(f)|| 2 + ||4 e )|| 2 + ||LW|| 2 

+ n J R(r)|i 2 +iiA(f)|| T ). 
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Remark 5. Let us note here that classical techniques allow to deduce 
from (26) that if the sequence 

(/3 n ||4 0) ll +/3n||4* ) ll + Ikf II + l|rW|| + ||4 0) f + ll^ff 

+ ||LM|| 2 + || J RM|| 2 + ||A(r)|| 2 M ) 

is bounded above by a suitable sequence (w n ), then ||An ||t can also be 
bounded above by (w' n ). However, since the first upper bound of ||A^||m) 
which will be available in the body of the proof of Lemmas 3 and 4 (see 
Section 2.3.2) is || An \\m = 0(1), inequality (26) leads only to ||A^||t = 
O(l) (which has already been proved). The idea to deduce from (26) a better 
upper bound for ||A„ \\t is to resort to the averaging effect; for that, we 

need to substitute 'y" 1 [|| ||jwf - IIA^Jm] for ||A^||^ in (26) [see (28) 
and Remark 6 below]. 

Inequality (27) allows to write 

||A^|| M < -l-[||A(f)|| M - WA^IU + 0( 7 n||4 M) ll + + ||rM||) 

in-"* 

+ C' 2 (\\L^f + \\Bgi f + \\ L Mf + \\ R Mf + ||AW|| T ). 

Set e > such that -fe^- < T* — eC 2 ; since lim n _KX) An = 0, we deduce 
from (26) that, for n large enough, 

\\A%\\ T <(l-(3 n T*)\\AW\\ T 

+ P n [0(P n \\L^\\ +P n \\RW\\ + \\ r W\\ + ||rW||)] 
+ n C 1 {\\Lg>f + \\RW + \\L^\\ 2 + \\R^\\ 2 )+P n e\\A^\\ M 
<(l-/3 n T*)||A(?)|| T 

+ (3 n [0((3 n \\LW\\ +P n \\R^\\ + ||rW|| + ||rM||)] 
+ P n C 1 (\\L^\\ 2 + ||flW|| a + \\L^\\ 2 + \\R^\\ 2 ) 

+ ^[||a^IIm-IIaK 1 IU / ] 

+ /? ra [0(7„||4 M) ll+7n||^ ) || + ||r^||)] 

+ p n sC 2 (\\L^ 2 +\\R^ 2 +\\L^\\ 2 +||120*)|| 2 +||AW|| T ). 
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Setting T** such that < T** <T* — eC 2 , we obtain, for n large enough, 

\\^U\ T <{i-PnTn\\^ ] \\T 

+ (3 n [0(p n \\L^\\+f3 n \\R^\\ + ||rf || + ||r^|| 
(28) +7n||4 M) ll+7n||^ ) ||)] 
+ /3 n [0(||LW||V||4 e) l| 2 + ll4r ) l| 2 + || J R^I| 2 )] 

+ -^[||a^IIm-IIaS 1 || m ]- 

Classical computations (see Section A. 4. 2) then allow to deduce from (28) 
that 



(29) ||Af|| T = 0{f3 2 nl ~ 2 wl + MnH^) + 0{y/K) 



Remark 6. Let us point out the averaging effect here again: the term 
~-i[|| A^)|| - IIA (At) II 1 

In Ul^n \\M H^n+lllAfJ 

present in (28) leads to the bounding term f3 n j~ l 5 n p '^ in (29), although the 
term ||A^ || M itself is bounded only by Sn . 

To conclude the proof of Lemma 6, we substitute the upper bound ob- 
tained in (29) for || A„ \\t in (27) and, via classical computations (see Section 
A. 4. 3), establish that 

(30) || AW || m = 0((3hn 2 Wn + PntfSM) + O(VA^). 

Lemma 6 then straightforwardly follows from the equivalence of the finite- 
dimensional vector norms. 

2.3.2. Body of the proof of Lemmas 3 and 4. Let (w n ) be a sequence 
satisfying Conditions (C) and (C), and such that ||/u n || = 0(w n ) a.s. In the 
proof of Lemma 6, we have seen that lim^^oo An = a.s. We can thus 
apply Lemma 6 with (5^) = 1, which ensures that 

\\A^\\ + \\A^\\=0(Pl 7 - 2 w 2 n + (3 n7 - l ) + o(y%) a.s. 

Now, let k be a positive integer, and assume that 

||A<?>|| + || AM || = 0^h~ 2 wl + [Mn l \ k ) + o(yfK) a.s. 
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Since (w n ) satisfies Conditions (C) and (C), the sequence (5 n ^) = {(3 2 l r y~ 2 w 2 l + 

[flnln 1 \ k + fin 2 ) also satisfies Conditions (C) and (C); it follows from the 
application of Lemma 6 that 

||A(f)|| + || || = 0{(5 2 nl - 2 w 2 n + [(5 nl ^) + {^W n ) a.s. 

We have thus proved by induction that, for all integers j, 

IIAfii + ||aW|| = o(pl 1 - 2 wl+[p nl - 1 y) + o(VK) a.s. 

Since Assumption (A3) ensures the existence of jo such that [/3 n 7n r° = 

1/2 

o(/V ), we have proved that, for any sequence (w n ) satisfying Conditions 
(C) and (C) and such that ||/i n || = 0(w n ), we have 

(31) ||A(f)|| + ||A(r)||=0(^ 7 - 2 ^ )+o( ^) a . s . 
Set k > 0, and assume that 

(32) ||/i n || =0(V7nlogs„ + [/3 n7 - 1 ] fc ) a.s. 

Since the sequence (V7n log s n + [finJn l ] k ) satisfies Conditions (C) and (C), 
the application of Lemma 2, and of Lemma 5 and (31) with (w n ) = (VTnlog s n + 
[flnln l ] k ) ensures that 

|M=0(||LW|| + ||i?W|| + ||AM||) 

= O^TnlogSn 

+ [(/?n7n + /W Vln log S„ + V/^log^] + (/?n7n Y^') 

+ o(\/AJ a - s - 

= 0{y/j n \ogs n + [Pnln l ] k+1 ) a.s. [in view of (A3)]. 

Now, in view of assumption (Al), we have ||pi n || = o(l) a.s., so that (32) 
is satisfied for k = 0. We have thus proved by induction that (32) holds 
for all k > 0. Since (A3) ensures the existence of ko such that [Pn^ni = 
o(V7nlogs„), it follows that \\fi n \\ = 0(sJ^ n \ogs n ) a.s. 

Remark 7. This latter upper bound of (/x n ) proves the second assertion 
of Proposition 1. 

To conclude the proof of Lemma 3, wg now apply Leniina 5 with {w^ — 

(V7nlogSn): 

• For all s G ]l/2, /3 A (H) [, we have 

\\R^\\=0(p nl n l/2 ^g^+n- s ) a.s. 
= 0(n-( b ~ a W^fo^l + n- s ) a.s., 
with, in view of (A3), b — a/2 > 6/2; the first part of Lemma 3 follows. 
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• We have 

II ^ II = 0([Mn Y /2 VPn logS n + V % ^OgU n ) a.S., 

which, in view of (A3), gives the second part of Lemma 3. 

To conclude the proof of Lemma 4, we apply (31) with (w n ) = (\/7n log s n ), 
which gives 

||AW|| + ||A^||=0([^ 1 ][/3 n log S „]) + (^:) a.s. 
In view of (A3), Lemma 4 follows. 

3. The averaging principle in the context of two-time-scale stochastic 
approximation algorithms. 

3.1. Asymptotic efficiency of two-time-scale stochastic approximation al- 
gorithms. The averaging principle has been introduced simultaneously by 
Ruppert [26] and Polyak [24] in the framework of single-time-scale stochastic 
approximation algorithms, and their pioneer work has been widely discussed 
and extended in this context (see, among many others, Yin [27], Delyon and 
Juditsky [6], Polyak and Juditsky [25], Kushner and Yang [16], Dippon and 
Renz [7, 8], Duflo [9], Kushner and Yin [17] and Pelletier [23]). Let us recall 
that the foundations of this principle are the following: (i) there exists an 
algorithm which converges with the optimal rate; however, in general, this 
"optimal algorithm" cannot be used because it depends on an unknown pa- 
rameter; (ii) taking a suitable average of a slowly converging algorithm leads 
to an "averaged algorithm," which has the same asymptotic behavior as the 
"optimal algorithm." 

To introduce the averaging principle in the context of two-time-scale 
stochastic approximation algorithms, we first need to define the notion of 
asymptotic efficiency in this framework, that is, to find out what the opti- 
mal convergence rate of the two-time-scale algorithms is. For that purpose, 
we follow the approach employed in the framework of the single-time-scale 
stochastic approximation algorithms, and consider the class of matricial and 
two-time-scale algorithms defined as 

Ag 

(33) 6 n+ i = 9 n H X n+ i, 

n 

(34) fi n+1 = fji n + — £ Y n+1 , 

n a 

where a G ]l/2, 1[, and where Ag (resp. An) is a d x d (resp. d' x d') nonsingu- 
lar matrix such that the matrix AgH + 1 /2 (resp. A^Q22) is attractive [recall 
that H and Q22 are defined in (A2)]. Following the computations made in 
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the beginning of Section 2.3, and setting (/3 n ) = (n x ) and (j n ) = (n a ), we 
rewrite (33)-(34) as 

(35) 9 n+1 = 9 n + AePn(QllOn + Ql2Pn + P { n ] + K+l), 

(36) n n+1 = p n + A^ n {Q 2l 9 n + Q 22i l n + pV> + Wn+i). 
From (36), we get 

Pn = Q^^VnV^+l - fl n ) - Q 22 Q 2 \9 n ~ Q 22 P^ ] ~ Qw W r^-l, 

which, reintroduced in (35), gives 

9 n+1 = 9 n + (5 n {A e H)9 n + PniAoQuQ^A- 1 )-/- 1 ^,^ - p n ] 

+ n A e <y n+1 - Q 12 Q 2 iW n+1 ) + PnMpW - Qi2Q 22 P ( ^). 
Following the proof of Theorem 1, we obtain 

SH{e n -e)^N(Q,V 9 {Ae)), 
where T,g(Ag) is the solution of the Lyapounov equation 



AeH + - 



E fl (A?) + S fl (A e 



a T Al + 1 



-AnTnAl 



[Tq being defined in (7)]. Classical computations (see, e.g., [9], page 166) 
ensure that the optimal choice of Aq in (33) is Aq = —H~ l , which leads to 
the optimal asymptotic covariance matrix T,q(Aq) = H^Tq^H^^ ', and to 
the following CLT for 9 n : 



(37) 



Therefore, one of the conditions we shall require to say that a general 
two-time-scale stochastic approximation algorithm of the type (l)-(2) is 
asymptotically efficient is that its fastest component 9 n satisfies the CLT 
(37). 

Now, the idea to find out the optimal weak convergence rate for the slowest 
component p n of the two-time-scale stochastic approximation algorithm (1)- 
(2) is the following. First, we invert the roles of 9 n and /i n , that is, we give 
to p n the position of the fastest component, and consider the following 
alternative algorithm to the algorithm (33)-(34): 



>n+l 



Pn+l 



+1> 



&n H 

n a 

Pn -\ 

n 
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where a e]l/2,l[. Then, we apply the results previously obtained for the 
matricial two-time-scale stochastic approximation algorithm (33)-(34). Set 

(38) G = Q22 — Q21Q11Q12, 

T M = lim E([W n+1 - Q 2 iQnV n+1 ][W n+1 - Q2iQilV n+ i} T \T n ) 

= T 2 2 + Q21Q11 TllfQn ] Q2I ~ ^21 [Qll ] Q2I ~ Q2lQn ^12: 

and assume that the matrices A^G + 1 /2 and AqQh are attractive. Follow- 
ing the proof of (37), we deduce that the optimal choice of Ap is A^ = —G^ 1 , 
which leads to the optimal covariance matrix G~ 1 F^ l [G^ 1 ] and to the fol- 
lowing CLT for /j, n : 

We can now precisely define the notion of asymptotical efficiency for two- 
time-scale stochastic approximation algorithms. 

Definition 3. Let (9 n ,jl n ) be given by a two-time-scale stochastic ap- 
proximation algorithm used for the search of the common zero (#*,//*) of two 
functions / and g. Assume that / and g satisfy assumption (A2)(i), and that 
the error-contaminated observations (X n+ i) and {Y n +i) of f{8 n ,jl n ) and 
5(^n;An) satisfy assumption (A4). We say that the two-time-scale stochas- 
tic approximation algorithm which defines (6 n ,fin) is asympotically efficient 
if the two following properties hold: 

(PI) MOn - 6*) 2>Ar(0,H- 1 T e [H- 1 ] T ), 

(P2) - //) 3 AA(0, G~ 1 T l j,[G~ 1 ] T ), 

where H, Tg, G and T M are defined in (6), (7), (38) and (39) respectively. 

Let us note that a sequence (0 n ,/2 n ) satisfying properties (PI) and (P2) 
can be obtained, under suitable assumptions, by simultaneously running the 
two following two-time-scale stochastic approximation algorithms: 

ft -ft H ~ l 

n 

fj, n+1 -fi n + — r n+1 

and 

e n+1 = 9 n + —x^l 1 , 

u i-fi G ~\ (2) 
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where X^+i, Y^+i, and Y^+\ are error-contaminated observations of 

f(6 n ,fj, n ), g(9 n ,fi n ), f (6 n , fin) and g(6 n ,fl n ), respectively. However, this pro- 
cedure has two main drawbacks. The first one (which is minor) is that it 
doubles the number of necessary observations. The second one (which is 
much more important) is that, most of the time, this procedure cannot be 
used, the matrices H and G being usually unknown. 



3.2. Averaging of two-time- scale stochastic approximation algorithms. We 
can now introduce the averaged two-time-scale stochastic approximation al- 
gorithm. Applying the averaging principle, we first define the slowly converg- 
ing two-time-scale algorithm. For that purpose, we let the sequence (9 n ,/j, n ) 
be still defined by the recursive equations (l)-(2), but, this time, the step 
sizes (P n ) and ( 7n ) fulfill the following assumption: 

(A'3) (P n ) = (Pon~ b ) and ( 7n ) = {^n~ a ) with O > 0, 7o > 0, and \ < a < 
b<l. 

We then define the averages of 6k and fik by setting 

1 n 1 n 

(40) 6» n = -V6 , fc and 7Z n = -V/i fc . 

n fc = 1 n *3 

To establish the joint weak convergence rate of (6 n ) and (ji n ), we need to 
strengthen assumption (A4) into the following condition: 

(A'4) Assumption (A4) is fulfilled with ||ri 9) || + ||ri M) || = o(n -1 / 2 ). 
Our main result in this section is the following theorem. 



Theorem 2 [Joint weak convergence rate of (6 n ) and (jt n )]. Let (6 n ,/j, n ) 
be defined by the recursive equations (l)-(2), and (6 n ,~p n ) by (40). Under 
assumptions (Al), (A2), (A'3) and (A'4), we have 

(41) ^ (1» ~_ 6 *) ^ M (0, DPTP T D T ), 

where T is defined in (A4)(ii), and where 

H^ 1 \ D _ ( I -Q12Q22 



° { G- 1 )' P {-Q21Q1I I 

In particular, the averaged two-time- scale stochastic approximation algo- 
rithm (6 n ,~p n ) is asymptotically efficient. 
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3.3. Proof of Theorem 2. Let us first note that the CLT (41) implies, in 
particular, that 

yfii(p n -0*)ZAr(0,H- 1 T [H- 1 ] T ), 
x /^(77„-/i*)^AT(0,G- 1 r / ,[G- 1 ] T ) ) 

which proves the asymptotic efficiency of the averaged algorithm (8 n ,~p n ). 
We now prove (41). 

We assume again, without loss of generality, that 9* = and /i* = 0. In 
the beginning of Section 2.3 we have seen that [see (13)]: 

On+l = n + PnH6 n + P n Qi2Q 2 2ln 1 \Pn+l ~ /Vl + /?n(^n+l ~ Ql2Q22^n+l) 

+ n (\pW+rW] - Qi2Q£\pM+rM]). 
We can thus write 

9n = —H 1 {Vn+l — Ql2Q22^n+l) + H 1 f3 n 1 [9 n +l ~ On] 
~ H~ X Q 12 Q 22 ^[fln+l - fi n ] 

- H'HlpW + r W] _ Q 12 Q^ipM + rW]), 

so that 

On = H- 1 t--J2(V k+1 - Q 12 Q22 l W k+1 ) + 7#> - K® - - n<A , 
\ n k=\ ) 



with 



1 n 

^ = -J2^k + i-o k ], 
n k=i 

i n 

n k=l 
1 n 

a k=l 

1 n 



Similarly, we have 



Jin = G- 1 (-^E (Wfc+i " QaiQu^+i) + - - K$ - T$Y\ 
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1 

1 n 

^i 6) = - E Q2iQlll% 1 [0k + i - Ok], 



r? 



fc=i 



fe=i 



fe=i 



A straightforward application of Lyapounov's theorem gives the following 
lemma: 

Lemma 7. 

4= E ( S +1 " ^ 12 ^S +1 ) - AA(0, prp T ). 
v 7 ^^ v^fc+i -Q21Q11 Vfc+iy v ' 

The CLT (41) follows thus from the combination of Lemma 7 and of the 
following lemma. 

Lemma 8. For i £ {I, ... ,8} , we have 

lim y/riR® =0 a.s. 

Proof. The application of Proposition 1 gives 
1 " 

-5 Eft" 1 [**+!-'*] 



fc=l 



+ E 

fc=l 



1 1 



/3fc-i /?fc 



1 1 



/ V/^nlogn^, 1 1 A 6-1 / — \ 

Q( n b/2-l/2 ]ogn ^ 



a.s. 



a.s. 
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Since b < 1, it follows that lim^oo s/nR^/ = and lim^,-^ y/nlZn = a.s. 
In the same way, we have 



1 



k=l 



1 ( f-n+l 



7fc-l Ik 



/'/,- 



01 



Vlog Sn 1 1 V^, a -1 I i 



= 0(n-^~ a ^ 2 logn) 

(2) 

Since a < 1, it follows that lin^^oo \fn1Z n = and lim r , 
Now, we note that 



a.s. 
a.s. 

,(5) 



y/nWh ' = a.s. 



1 



1 



k=i 



°[-fcY.^ogs k \ a.s. 



k=l 



0{n 1 ' 2 - a logn) 



a.s. 



Since a > 1/2 and in view of (12), it follows that lini^. >oc \pnR} n = and 

(7) 

lim n _ >00 \/n1Zn = a.s. Finally, assumption (A'4) ensures that ]im n -t 00 y/n x 
TZn^ = and lim^oo \fn7Vh' = a.s. □ 

APPENDIX 

A.l. Two technical lemmas. 

Lemma 9. Let (x n ) be a sequence of positive real numbers, let (u n ) be 
an W 1 -valued random sequence such that \\u n \\ = 0(x n ) a.s., set T > 0, 

n n 

Z (l) = e -u n T e u * T f3 k x k and = e u " H £ e~ u " H (i k u k . 

k=l k=l 

Let (w n ) be a nonrandom sequence satisfying Condition (C). 
1. For all T' s]0,T[, we have 

|^(l) | = j 0(e- u ^'l b=1 +W n ), ifx n = 0(w n ), 



o(e UnT 't b= i+w n ), ifx n = o{w n ). 
2. For all T'e]0,#'[, we have 



117(2) || = \ 0(e Unl l b=1 +w n ), ifx n = 0(w n ), 
11 n 11 1 (e-"« T 'l 6=1 + w n ), ifx n = o(w n ). 
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Lemma 10. Let (x n ) be a sequence of positive real numbers, let (u n ) be 
an M. d -valued random sequence such that \\u n \\ = 0(x n ) a.s., set T > 0, 

n n 

Z^ = e- s " T Y,e SkT 7kX k and = e s " Q22 £ e~ s ^ lk u k . 

k=l k=l 

Let (w n ) be a nonrandom sequence satisfying Condition (C). We have 



1^1 + 11^1 



0(w n ), if x n = 0(w n ), 

o(w n ), if X n = o(w n )- 



PROOF of Lemma 9. We first establish the upper bound of (Zn ). 

Consider the case b = 1, that is, (f3 n ) = (Pon" 1 ). In the case x n = o(w n ), 
we have 



\zW\ = 0[n-P° T J2k l3oT - 1 



Xk 



k=l 



= o^n-^ T Y,k PoT - 1 - w C{k)\ 

= o(rt- /3oT [logn + n* T - £J ]£(n)) 
= o(n~ /3 ° T C(n) logn + w n ). 
Since £ is a slowly varying function, it follows that, for all T' S]0,T[, 

\zU\ = (n-P° T '+w n ) 
= o{e- UnT ' +w n ). 

In the = 0(w n ), the upper bound of (Zn ) is obtained by replacing 

o(-) by O(-) in the previous equations. 

Consider the case b < 1. We note that the sequence (Zn^) satisfies the 
recursive equation 

Z\ ^ = ^ n finXni 

so that we can write 



-l7(l)_„-A,T 



Wn-l 



= [1 - (3 n T + O{0i)][l + oiPn^Wn-iZ^lj + P n W- l X n 
= [1 - (3 n T + o(^ n )](w n _i4!l 1 ) + flnW^Xn. 

Now, set T' G ]0,T[; for n large enough, we get 



'Jo- x Z^\ < (l-/3 n T , )\w n ^ 1 Z { ^ 1 \+(3 n w- 1 : 
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and the application of Lemma 4.1.2 of [9] ensures that if x n = 0(w n ), 
then the sequence {w^-i^n-i) is bounded, that is, \Zn\ = 0(w n ); if x n = 
o(w n ), then the sequence (Wn-i^n-i) § oes *° zero > that is, \Z n \ = o(w n ). 

(2) 

We now establish the upper bound of (Z n ). Let ||| • ||| denote the matrix 
norm associated with the Euclidean vector norm. We have 

n 

n4 2) li<El e( " )if iii^iKil' 
/c=i 

and the application of Proposition 3.1.2 of [9] ensures that, for all T € 
]0,A<*)[, 



ll4 2) ll<E e_( " )T ^ a - s - 



k=l 

(2) 

The upper bound of (Zn ) then follows straightforwardly from the one ob- 
tained for (Z n l) ). □ 

Proof of Lemma 10. The proof is straightforward by following the 
proof of Lemma 9 in the case b < 1 . □ 



A.2. Proof of Lemma 1. Set 



Mj n) 



3 

E 

k=l 
An) 



e-^ H (5 k {V k+l - QuQ^Wk+i) 

/ , \ ,-,Si-Ooo . TXT" 

'k+1 



S k Ll22 lkWk 



For each n, = (Mj )j>\ is a martingale whose increasing process sat- 

isfies 

f\(n) _ ( M,n M; 



2,n ^4,n 



with, in view of assumption (A4), 

A^ n = (3- l e u - H \j^(3 2 k e-^ H T e e 



^ 3 2 k e- UkH T e e- UkHT \e UnHT , 
J 



A 2 , n = y/fa 1 ^ 1 e u » H e- a »\ E PkJke- UkH T lt2 e- s ^ \e s " Q ™, 



.k=i 
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The application of Lemma 4 in [19] ensures that 

lim Ai n = £ e and lim A An = S,. 

n— >oo n— >oo ^ 

Moreover, we note that 

IMA =o(sll3n 1 1n 1 ^klkie {u -- Uk)H M^ Sn - Sk)Q ^ ■ 

Set T G ]0, A (H) [ and T' G ]0, A^ 22 ) [; the application of Proposition 3.1.2 in [9] 
ensures that 



A 2 ,„||| =0[ V/JnSn 1 E /3 fc 7fce- r(u ' l - nfc) e 



T(u n -u k ) p -T'(s n -s k ) 
\ k=l / 

and the application of Lemma 10 gives 

III An III = O^Pn^Pn) = O^^n 1 ). 
In view of (A3), it follows that lim n _ >00 

M,n = 0, and we thus obtain 
'£„ 



lim (M)W 



s M 



a.s. 



Now, set T G]l 6= i/(2/3 ), A {h) [ and T' G ]0, A^ 22 )[; in view of assumption 
(A4), we have 



fc=l 



\fe=l k=l ) 

(n n \ 

E(^ m/2 / ? fc l e" mTK " Ufc) ) + E 7n m/2 7re" mT ' (Sn_Sfc) a.s., 
fc=i fc=i / 

where the latter upper bound follows from the application of Proposition 
3.1.2 in [9]. The application of Lemmas 9 and 10 then ensures that, for all 
r*G]l 6=1 /(2/3 ),T[, 



£E[||MM-Man^ 
fc=i 



fc-il 



0(Pn m/2 [e~ mT * Utt i&=i + C" 1 ] +7n m/2+(m_1) ) a.s. 
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so that it be comes 



Km 2E[||M^-M£\|| m l^-i] = a.s 
The application of Lyapounov's theorem then gives 



7n 



AT 







s, 



which concludes the proof of Lemma 1. 

A.3. Proof of Lemma 5. We first note that, in view of (15), we have 

Rn+l = PnJn 1 Ql2Q 22 
n 

+ e u " H Y.[e- Uk - lH fa-ill\-e- u * H f3ak l ]Qi2Q22^ 



k=2 



Pnln 1 Ql2Q22 1 f 1 n+l 
k=2 



where 



Uk = 1?[e-h H fa-^^lxlk ~ I]Ql2Q22Vk- 



It follows that 
II II- 

ll-^n+lll _ 

Note that 



0[ Pn"fn 1 \\f J 'n+l\\ + 



D u„H 



k=2 



-UkH 



PkU k 



+ e 



u„H\ 



||?7n||=0(7n%llMn||) 

= O^' 1 j3 n W n ) a.S. 

Since the sequence (w n ) satisfies Condition (C), the sequence (^~ l (3 n w n ) 
satisfies Condition (C); it follows from the application of Lemma 9 that, for 
all t G]0,#'[, we have 

WR^lW = OiMn^n + e-^l b= i) + 0(|||e"" H |||) 



a.s. 



Now, the application of Proposition 3.1.2 in [9] ensures that, for all t G 
]0,#)[, \\\e UnH \\\=0(e~ Unt ); it follows that, for alH e]0, A^[, 



\R 



(0) I 

•n+l I 



Oifinln 1 W n + e" 



-u n t\ 



a.s. 
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and thus, for all s /3qA^[, we obtain 

WK+iW = 0(f3 n Jn lw n + n~ s ) a.s., 

which proves the first part of Lemma 5. 
In view of (18), we note that 

n 

R W i = e s n Q22j2e- s ^ k U k , 

k=l 

with, by application of Lemma 2 and of the first part of Lemma 5, 

\\uj = o(\\lW\\ + \\rW\\) 

= 0( \/Pn log tin + Pn7n l Wn + n~ S ) a.S. 

= 0(y/f3 n \ogu n + I3 n -f~ l w n ) a.s. 

Since the sequence (w n ) satisfies Condition (C), the sequences (y/ j3 n log u n ) 
and (P n ^/~ 1 w n ) satisfy Condition (C); the application of Lemma 10 gives 

ll^i+lll = 0{\/l3 n log U n + (3n7n lw n) a.S., 

which concludes the proof of Lemma 5. 

A.4. Technical details for the proof of Lemma 6. 

A. 4.1. Proof of (20) and (21). Noting that, in view of (14) and (15), we 
have 

4+1 = Pn(V n+ i - QuQ^Wn+i) + e?» H LV\ 

R% = ^Ql2Q 22 1 7n 1 K+l " Mn] + e^ H R^\ 

and using (16) and then (13), we write 
A (0) -n Aff) zfifl) 

^n+l — u n+l ^n+l ^n+l 

= 6 n + (3 n H0 n + Mb® + rf ] - QnQ^tftf + r^]) 

-e^ H L^-e^ H R^ 
= (I + f3 n H)6 n - [I + (3 n H + 0(Pl)] [L® + i?(f)] 

+ Pn([pl e) + rf] - Qi2Q£\p£> + r^]) 
= (/ + A<?) + 0(^)[LW + i#>] 

+ /3n([^ } + rf ] - Q^Q^W + r^]), 
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which proves (20). Similarly, we note that, in view of (11), (17), (18) and 
(19), we have 



= Pn + ln(Q2lO n + Ql2Pn + pT + + W n+1 ) - ~{ n W n+1 

Using (16), it follows that 

= Hn + ~fn(Q2l0n + Q 22 /in + P^ + - e*> Qa * 



-7nQ2i[^-A(f)]-e^i?W 

= (/ + 7nQ 22 K " [/ + 7nQ 22 + 0(-£)] [ L M + 

+ ln[p n u) +r^ + Q 21 A^] 
= (I + 7«Q 22 )AW + 0( 7 ^)[L(f ) + j#>] 

+7n[p^+^+g 2 iA(f)], 



which proves (21). 

A.4.2. Proo/ of (29). In view of (28), we have 



A&IIt < (1 - ^r**)||Af|| T + /SU, + _^-[||Atf)|| M - ||A 



where (z n ) is a nonnegative sequence such that 



z n = 0((3 n \\LW\\ +(3 n \\RW\\ + ||rf || + ||r^|| +7n||4 M) ll +7n||^ ) || 



+ Il4* ) ll 2 + ll4 tf) ll 2 + ll^ ) ll 2 + llfl& t) ll 2 )- 



For n > 1, set vr n = JlLiC 1 ~ PkT**). We note that 



- Pn+1 ~ J^ n+ i ~ i\ 



'n+1 





M 




+ 



A II a M 
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IIA 



+ _^||AW| 



(0)ll 

1 T ' 7Ti7iM 



M 



+ —PkZk 



k=l 



TTfc 



M * ^ n klk 



IIA 



Mil 



Since Tikl^k-i = 1 - AT** and since, in view of (A3), = i + o^), 

there exists c> such that 



IIA 



{6) II <vr 
n+l\\T — w n 



IIA^I 



T 7Tl7lM*" 1 



fe=l 



TTfc 



z fc + 4||A^|| M 



Noting that 7r n /vr fc < e - T **(««-«fc), it follows that 



IIAffiiHr 



\ fc=i 



Zt + ^HA (M) || 
fe + 7 fe " fe IIm 



which can be rewritten as 
IIA^H^ofe-^- 



+ e- T " u -J2e T *^Pk[Pk\\Lf ) \\+P k \\R { k ) U 



fc=i 



+ e -^^£ e ^/3,[ 7fc ||Lf || + lk \\R^\\ + ||4 ( 



W|,2 



k=l 



+ e -T" U „^ e T»n fe ^|| A W||^ 

Replacing ||rl[ || and ||ri^|| by their upper bounds given in assumption 

(A4)(iv), ||4 || and ||i^|| by their upper bounds given by Lemma 2, H-R^ || 

and H-Rjk II by their upper bounds given by Lemma 5, ||Ajp|| by 5^, and 
doing some straightforward simplifications, we obtain 



(m)ii2 



>(M)||2i 



IIA 



(9) II 
n+lllT 



O e 



-T**u n 



+ e 



-T**u n 



e T " Uk Pk [Hhfwl + lk log s k + Pk%^k 



ixO*) l 



k=i 



+ e 



-T**u n 



k=i 



T**u k 



PkO(VPk) 
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Now, since the sequences (w n ) and (5^) satisfy Condition (C), the se- 
quences (/3^7~ 2 w^), (7 n logs n ) and (/3 n 7~ 1 5i^) satisfy Condition (C). More- 

1/2 

over, the sequence (/V ) satisfies Condition (C). The application of Lemma 

9 then ensures that, for all V g]0,T**[, 

|| Af || T = 0(e- T ** u " + e- T ' u "t b=1 + 2 nl - 2 w 2 n + ln log s n + /3 n7 " M^) 
+ o(v^). 

Let us recall that T** has been set such that T** > l&=i/(2/?o), and note 
that T' can be chosen such that e _T " u " +e _T n " = o(yf]3^). Since 7„ logs n = 
o(v / ^n), it follows that 

II Af || T = 0(P 2 nl - 2 w 2 n + (3 nl - 1 5^) + o(V^), 

which proves (29). 

A.4.3. Proof of (30). In view of (27), we have 

/ n 

l|A^|| M = [e-^ + e-^^e^jML^ll + 7*11^11 + Ik^H 

V k=l 

_i_ ||r( e )||2 , || R Wi|2 ||7»||2 
+ ll L fc II + ll^fe II + ll L fc II 

+ ll«hl 2 + l|Aflw). 

Replacing ||r[ || by its upper bound given in Assumption (A4)(iv), ||L>f \\ 

and H^j.^11 by their upper bounds given by Lemma 2, \\Rf, & '\\ and H-R^ || by 

their upper bounds given by Lemma 5, ((A^ \\t by its upper bound given 
in (29), and doing some straightforward simplifications, we deduce that 

II AM \\ M = O [e~ M * s - + e~ M * s " jr e M * s *lA0l% 2 ™\ + Ik log s k 

\ k=l 

+^7 fe - 1 4 At) ] 

+ e -M*s n j^ e M*s klk0{ J Wk) \ 
k=l J 

Since the sequences (w n ) and (5^) satisfy Condition (C), the sequences 
(ffi~/- 2 wl), (7„logs n ), (n~ s ) and {PnJn 1 ^) satisfy Condition (C). More- 

1/2 

over, the sequence ((3 n ) satisfies Condition (C). The application of Lemma 

10 then ensures that 

II A^ \\ m = 0(e- M * S " + f3h~ 2 W 2 n + 7 n logS„ + Mn l ^ ] ) + o{ VWn). 
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Noting that e M * Sn +7 n logs n = o(y/]3^), it follows that 

\\^ l) \\ M = 0(131^1 + Mn'^+oi^K), 

which concludes the proof of (30). 
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